{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "K7nBJnADTgUw"
   },
   "source": [
    "### You can also run the notebook in [COLAB](https://colab.research.google.com/github/deepmipt/DeepPavlov/blob/master/examples/gobot_extended_tutorial.ipynb)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "iPbAiv8KTgU4"
   },
   "source": [
    "# Goal-oriented bot in DeepPavlov"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "us6IsTUETgU5"
   },
   "source": [
    "This tutorial describes how to build a goal/task-oriented dialogue system with DeepPavlov framework. It covers the following steps:\n",
    "\n",
    "0. [Data preparation](#0.-Data-Preparation)\n",
    "1. [Build Database of items](#1.-Build-Database-of-items)\n",
    "2. [Build Slot Filler](#2.-Build-Slot-Filler)\n",
    "3. [Build and Train a Bot](#3.-Build-and-Train-a-Bot)\n",
    "4. [Interact with bot](#4.-Interact-with-Bot)\n",
    "\n",
    "An example of the final model served as a telegram bot:\n",
    "\n",
    "![gobot_example.png](https://github.com/deepmipt/DeepPavlov/blob/master/examples/img/gobot_example.png?raw=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 806
    },
    "colab_type": "code",
    "id": "Vtu-7ns2TgUz",
    "outputId": "8cdc252f-1a35-4ed3-bf0a-f54046d8c6a8"
   },
   "outputs": [],
   "source": [
    "!pip install deeppavlov\n",
    "!python -m deeppavlov install gobot_simple_dstc2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "4R066YWhTgU6"
   },
   "source": [
    "## 0. Data Preparation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "gppbVe-HTgU7"
   },
   "source": [
    "In this tutorial we build a chatbot for restaurant booking. To train our chatbot we use [Dialogue State Tracking Challenge 2 (DSTC-2)](http://camdial.org/~mh521/dstc/) dataset. DSTC-2 provides dialogues of a human talking to a booking system labelled with slots and dialogue actions. These labels will be used for training a dialogue policy network.\n",
    "\n",
    "First of all let's take a quick look at the data for the task. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 137
    },
    "colab_type": "code",
    "id": "K9lF3QFJTgU8",
    "outputId": "6ab259e2-3f88-4b25-9371-21d3f38fcef3"
   },
   "outputs": [],
   "source": [
    "from deeppavlov.dataset_readers.dstc2_reader import SimpleDSTC2DatasetReader\n",
    "\n",
    "data = SimpleDSTC2DatasetReader().read('my_data')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 50
    },
    "colab_type": "code",
    "id": "uu56jAGJTgVD",
    "outputId": "1536bb2c-6c1f-45a6-c0a7-a92106ed7dfe"
   },
   "outputs": [],
   "source": [
    "!ls my_data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "HmNmE80MTgVG"
   },
   "source": [
    "The training/validation/test data are stored in json files (`simple-dstc2-trn.json`, `simple-dstc2-val.json` and `simple-dstc2-tst.json`):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 1000
    },
    "colab_type": "code",
    "id": "LIm9DQyzTgVH",
    "outputId": "0a82c3f1-8afb-42d5-e3e3-0e9dd9178a20"
   },
   "outputs": [],
   "source": [
    "!head -n 101 my_data/simple-dstc2-trn.json"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "zO4CWg0XYNSw"
   },
   "source": [
    "To iterate over batches of preprocessed DSTC-2 we need to import `DatasetIterator`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "piBBcw9ZTgVK",
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "from deeppavlov.dataset_iterators.dialog_iterator import DialogDatasetIterator\n",
    "\n",
    "iterator = DialogDatasetIterator(data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "jVU5JGnTTgVM"
   },
   "source": [
    "You can now iterate over batches of preprocessed DSTC-2 dialogs:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 1000
    },
    "colab_type": "code",
    "id": "1RSwEH3CTgVN",
    "outputId": "b2a0ecdb-89d1-4784-eeb9-749f7b754ff6"
   },
   "outputs": [],
   "source": [
    "from pprint import pprint\n",
    "\n",
    "for dialog in iterator.gen_batches(batch_size=1, data_type='train'):\n",
    "    turns_x, turns_y = dialog\n",
    "    \n",
    "    print(\"User utterances:\\n----------------\\n\")\n",
    "    pprint(turns_x[0], indent=4)\n",
    "    print(\"\\nSystem responses:\\n-----------------\\n\")\n",
    "    pprint(turns_y[0], indent=4)\n",
    "    \n",
    "    break"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "AKTZWtm8ZtPi"
   },
   "source": [
    "In real-life annotation of data is expensive. To make our tutorial closer to production use-cases we take  only 50 dialogues for training."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "UlappYTbTgVT"
   },
   "outputs": [],
   "source": [
    "!cp my_data/simple-dstc2-trn.json my_data/simple-dstc2-trn.full.json"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 33
    },
    "colab_type": "code",
    "id": "tTU9yM-CTgVX",
    "outputId": "1568aaed-7f8e-4f77-a637-cda5a9556740"
   },
   "outputs": [],
   "source": [
    "import json\n",
    "\n",
    "NUM_TRAIN = 50\n",
    "\n",
    "with open('my_data/simple-dstc2-trn.full.json', 'rt') as fin:\n",
    "    data = json.load(fin)\n",
    "with open('my_data/simple-dstc2-trn.json', 'wt') as fout:\n",
    "    json.dump(data[:NUM_TRAIN], fout, indent=2)\n",
    "print(f\"Train set is reduced to {NUM_TRAIN} dialogues (out of {len(data)}).\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "l5mjRphbTgVb"
   },
   "source": [
    "## 1. Build Database of items"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "n597CLhqjqcd"
   },
   "source": [
    "### Building database of restaurants"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "nJFkgfjTTgVf"
   },
   "source": [
    "To assist with restaurant booking the chatbot should have access to a `database` of restaurants. The `database` contains task-specific information such as type of food, price range, location, etc.\n",
    "\n",
    "    >> database([{'pricerange': 'cheap', 'area': 'south'}])\n",
    "    \n",
    "    Out[1]: \n",
    "        [[{'name': 'the lucky star',\n",
    "           'food': 'chinese',\n",
    "           'pricerange': 'cheap',\n",
    "           'area': 'south',\n",
    "           'addr': 'cambridge leisure park clifton way cherry hinton',\n",
    "           'phone': '01223 244277',\n",
    "           'postcode': 'c.b 1, 7 d.y'},\n",
    "          {'name': 'nandos',\n",
    "           'food': 'portuguese',\n",
    "           'pricerange': 'cheap',\n",
    "           'area': 'south',\n",
    "           'addr': 'cambridge leisure park clifton way',\n",
    "           'phone': '01223 327908',\n",
    "           'postcode': 'c.b 1, 7 d.y'}]]\n",
    "           "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "rNpewHp-TgVd"
   },
   "source": [
    "&nbsp;\n",
    "![gobot_database.png](https://github.com/deepmipt/DeepPavlov/blob/master/examples/img/gobot_database.png?raw=1)\n",
    "&nbsp;"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "-TU-NLnNa9tk"
   },
   "source": [
    "The chatbot should be trained to make api calls. For this, training dataset contains a `\"db_result\"` dictionary key. It annotates turns where system performs an api call to the database of items. Rusulting value is stored in `\"db_result\"`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "colab_type": "code",
    "id": "EVNRZmeiTgVh",
    "outputId": "edba5e2b-235f-423f-8bfa-8d02506c4c7e"
   },
   "outputs": [],
   "source": [
    "!head -n 78 my_data/simple-dstc2-trn.json | tail +51"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "GT4YBHMnl0Xd"
   },
   "source": [
    "Set `primary_keys` to a list of slot names that have unique values for different items (common SQL term). For the case of DSTC-2, the primary slot is a restaurant name."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "colab_type": "code",
    "id": "JjKbIAyaTgVk",
    "outputId": "07620401-80f5-490a-cff2-5d5f013a365b"
   },
   "outputs": [],
   "source": [
    "from deeppavlov.core.data.sqlite_database import Sqlite3Database\n",
    "\n",
    "database = Sqlite3Database(primary_keys=[\"name\"],\n",
    "                           save_path=\"my_bot/db.sqlite\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "a2e1u-z0TgVo"
   },
   "source": [
    "\n",
    "Let's find all `\"db_result\"` api call results and add them to our database of restaurants:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "colab_type": "code",
    "id": "RlKg5UtqTgVp",
    "outputId": "a387df1f-4418-498b-a125-9e351a8e0cf9"
   },
   "outputs": [],
   "source": [
    "db_results = []\n",
    "\n",
    "for dialog in iterator.gen_batches(batch_size=1, data_type='all'):\n",
    "    turns_x, turns_y = dialog\n",
    "    db_results.extend(x['db_result'] for x in turns_x[0] if x.get('db_result'))\n",
    "\n",
    "print(f\"Adding {len(db_results)} items.\")\n",
    "if db_results:\n",
    "    database.fit(db_results)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "XeJMI9qaTgVt"
   },
   "source": [
    "### Interacting with database"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "2JLUF2b_TgVu"
   },
   "source": [
    "We can now play with the database and make requests to it:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "colab_type": "code",
    "id": "VRCU_MJnTgVv",
    "outputId": "017803c4-36ab-49bc-ae40-7df87356f5c2"
   },
   "outputs": [],
   "source": [
    "database([{'pricerange': 'cheap', 'area': 'south'}])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "colab_type": "code",
    "id": "U2wOAIlpTgV1",
    "outputId": "e83e53b9-3431-4d1c-9bed-0e841d2b6fc4"
   },
   "outputs": [],
   "source": [
    "!ls my_bot"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "mBoO34NzTgV4"
   },
   "source": [
    "## 2. Build Slot Filler"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "TGlJRwTCYkiQ"
   },
   "source": [
    "`Slot Filler` is a component that finds slot values in user input:\n",
    "\n",
    "    >> slot_filler(['I would like some chineese food'])\n",
    "    \n",
    "    Out[1]: [{'food': 'chinese'}]\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "5RqXeLdTTgV4"
   },
   "source": [
    "&nbsp;\n",
    "![gobot_slotfiller.png](https://github.com/deepmipt/DeepPavlov/blob/master/examples/img/gobot_slotfiller.png?raw=1)\n",
    "&nbsp;"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "TcJGPFq4TgV5"
   },
   "source": [
    "To implement a `Slot Filler` you need to provide\n",
    "    \n",
    " - **slot types**,\n",
    " - all possible **slot values**,\n",
    " - also, it is good to have examples of mentions for every value of each slot.\n",
    " \n",
    "In this tutorial, a schema for `slot types` and `slot values` should be defined in `slot_vals.json` with the following format:\n",
    "\n",
    "    {\n",
    "        'food': {\n",
    "            'chinese': ['chinese', 'chineese', 'chines'],\n",
    "            'french': ['french', 'freench'],\n",
    "            'dontcare': ['any food', 'any type of food']\n",
    "        }\n",
    "    }\n",
    "                \n",
    "\n",
    "Let's use a simple non-trainable slot filler that relies on Levenshtein distance."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "colab_type": "code",
    "id": "zVi5XynnTgV6",
    "outputId": "e9d68c8c-3bbb-4f80-98a5-92cbfe0eb5ac"
   },
   "outputs": [],
   "source": [
    "from deeppavlov.download import download_decompress\n",
    "\n",
    "download_decompress(url='http://files.deeppavlov.ai/deeppavlov_data/dstc_slot_vals.tar.gz',\n",
    "                    download_path='my_bot/slotfill')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "colab_type": "code",
    "id": "NR1S3PXCTgV9",
    "outputId": "013e9dba-427c-4255-aad5-0627477157e8"
   },
   "outputs": [],
   "source": [
    "!ls my_bot/slotfill"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "-OZ9TqDKZ6Fv"
   },
   "source": [
    "Print some `slot types` and `slot values`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "colab_type": "code",
    "id": "KqgfYr4RTgWE",
    "outputId": "a6830aa3-0bcc-4011-a4ab-5b5e48e6a20f"
   },
   "outputs": [],
   "source": [
    "!head -n 10 my_bot/slotfill/dstc_slot_vals.json"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "eIufDAvATgWN"
   },
   "source": [
    "Check performance of our slot filler on DSTC-2 dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "XUSj5R3uTgWP"
   },
   "outputs": [],
   "source": [
    "from deeppavlov import configs\n",
    "from deeppavlov.core.common.file import read_json\n",
    "\n",
    "slotfill_config = read_json(configs.ner.slotfill_simple_dstc2_raw)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "pFda6_LBTgWT"
   },
   "source": [
    "We take [original DSTC2 slot-filling config](https://github.com/deepmipt/DeepPavlov/blob/master/deeppavlov/configs/ner/slotfill_dstc2_raw.json) from DeepPavlov and change variables determining data paths:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "yr8MbFLwTgWV"
   },
   "outputs": [],
   "source": [
    "slotfill_config['metadata']['variables']['DATA_PATH'] = 'my_data'\n",
    "slotfill_config['metadata']['variables']['SLOT_VALS_PATH'] = 'my_bot/slotfill/dstc_slot_vals.json'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "ZxMTySrpaZVP"
   },
   "source": [
    "Run evaluation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "colab_type": "code",
    "id": "CdrDW4bVTgWZ",
    "outputId": "ac56ae74-b368-437e-c70f-01b418ba883f"
   },
   "outputs": [],
   "source": [
    "from deeppavlov import evaluate_model\n",
    "\n",
    "slotfill = evaluate_model(slotfill_config);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "azulujiLTgWb"
   },
   "source": [
    "We've got slot accuracy of **93% on valid** set and **95% on test** set."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "FkZvQ-yNig1u"
   },
   "source": [
    "Building `Slot Filler` model from DeepPavlov config."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "uWeXTtVhTgWc"
   },
   "outputs": [],
   "source": [
    "from deeppavlov import build_model\n",
    "\n",
    "slotfill = build_model(slotfill_config)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "ihi4lpXUi-_V"
   },
   "source": [
    "Testing the model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "colab_type": "code",
    "id": "bMRSU_bnTgWf",
    "outputId": "d224e4be-1537-428d-ff67-55076224946d"
   },
   "outputs": [],
   "source": [
    "slotfill(['i want cheap chinee food'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "U2PUxB5fTgWl"
   },
   "source": [
    "Saving slotfill config file to disk (we will require it's path later)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "5MyFaEM7TgWl"
   },
   "outputs": [],
   "source": [
    "import json\n",
    "\n",
    "json.dump(slotfill_config, open('my_bot/slotfill_config.json', 'wt'))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "colab_type": "code",
    "id": "_ZlRvicuTgWo",
    "outputId": "4f1c3d46-d3b1-4923-823e-e2df1027fc6f"
   },
   "outputs": [],
   "source": [
    "!ls my_bot"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "E_InRKO6TgWt"
   },
   "source": [
    "## 3. Build and Train a Bot"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "ySe2m9-5m6iW"
   },
   "source": [
    "### Dialogue policy and response templates"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "qjwbkeDl3TBg"
   },
   "source": [
    "A policy module of the bot decides what action should be taken in the current dialogue state. The policy in our bot is implemented as a recurrent neural network (recurrency over user utterances) followed by a dense layer with softmax function on top. The network classifies user input into one of predefined system actions. Examples of possible actions are to say hello, to request user's location or to make api call to a database. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "wLE1iebG3WJc"
   },
   "source": [
    "![gobot_policy.png](https://github.com/deepmipt/DeepPavlov/blob/master/examples/img/gobot_policy.png?raw=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "ghF-W56m3iW-"
   },
   "source": [
    "All actions available for the system should be listed in a `simple-dstc2-templates.txt` file. Also, every action should be associated with a template string of the corresponding system response."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "TjDnGyiN3nIr"
   },
   "source": [
    "![gobot_templates.png](https://github.com/deepmipt/DeepPavlov/blob/master/examples/img/gobot_templates.png?raw=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "-xqGKtXBTgWu"
   },
   "source": [
    "Templates for responses should be in the format `<act>TAB<template>`, where `<act>` is a dialogue action and `<template>` is the corresponding response. The response text might contain slot type names, where every `#slot_type` will be filled with the slot value from the current dialogue state."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 204
    },
    "colab_type": "code",
    "id": "bNyliD8PTgWw",
    "outputId": "15434eda-f87e-4b19-ac1a-59c4ba345344"
   },
   "outputs": [],
   "source": [
    "!head -n 10 my_data/simple-dstc2-templates.txt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "TWG40VysTgW0"
   },
   "source": [
    "In essense, the dialogue policy module solves classification task, where a set of classes is defined in `simple-dstc2-templates.txt`. So, to train the dialogue policy network you need action label for each system's turn in training dialogues. The DSTC-2 provides `\"act\"` dictionary key that contains action associated with current response. Here is an example of training data for the policy network."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 438
    },
    "colab_type": "code",
    "id": "eeeqFeWkTgW1",
    "outputId": "c05f6279-a589-4b54-be62-0400251e840a"
   },
   "outputs": [],
   "source": [
    "!head -n 24 my_data/simple-dstc2-trn.json"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "k9d67KtlTgW5"
   },
   "source": [
    "Now we configure a full data processing pipline for the restaurant bot.\n",
    "\n",
    "As a starting point, let's take a [simple DSTC2 bot config](https://github.com/deepmipt/DeepPavlov/blob/master/deeppavlov/configs/go_bot/gobot_simple_dstc2.json) ([more configs](https://github.com/deepmipt/DeepPavlov/blob/master/deeppavlov/configs/go_bot) are available) from DeepPavlov and, then change sections responsible for:\n",
    "- embeddings, \n",
    "- database,\n",
    "- slot filler,\n",
    "- templates,\n",
    "- data and model load/save paths.\n",
    "\n",
    "Loading bot:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "db9_ozwnTgW5"
   },
   "outputs": [],
   "source": [
    "from deeppavlov import configs\n",
    "from deeppavlov.core.common.file import read_json\n",
    "\n",
    "gobot_config = read_json(configs.go_bot.gobot_simple_dstc2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "QFQFQ7_bTgXa"
   },
   "source": [
    "Set default bag-of-words embedder:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "fLgZuzQgTgXc"
   },
   "outputs": [],
   "source": [
    "gobot_config['chainer']['pipe'][-1]['embedder'] = None"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "oNGj-ARxTgW-"
   },
   "source": [
    "Configure bot to use our database:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "VanrlHZZTgXB"
   },
   "outputs": [],
   "source": [
    "gobot_config['chainer']['pipe'][-1]['database'] = {\n",
    "    'class_name': 'sqlite_database',\n",
    "    'primary_keys': [\"name\"],\n",
    "    'save_path': 'my_bot/db.sqlite'\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "nQ_YE904TgXQ"
   },
   "source": [
    "Configure bot to use levenshtein distance based slot filler:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "rNVubPKdTgXU"
   },
   "outputs": [],
   "source": [
    "gobot_config['chainer']['pipe'][-1]['slot_filler']['config_path'] = 'my_bot/slotfill_config.json'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "ug76yN8sWUga"
   },
   "source": [
    "To maintain values of slots of the whole conversation, we first detect slot values mentioned in the latest utterance and then apply `tracker` module which updates current global slot values, so called dialogue state:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "KJ8P-mHOWtZq"
   },
   "outputs": [],
   "source": [
    "gobot_config['chainer']['pipe'][-1]['tracker']['slot_names'] = ['pricerange', 'this', 'area', 'food']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "6l6H_t1iTgW7"
   },
   "source": [
    "Configure bot to use templates:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "209m3f6yTgW8"
   },
   "outputs": [],
   "source": [
    "gobot_config['chainer']['pipe'][-1]['nlg_manager']['template_type'] = 'DefaultTemplate'\n",
    "gobot_config['chainer']['pipe'][-1]['nlg_manager']['template_path'] = 'my_data/simple-dstc2-templates.txt'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "HMXih1roTgXi"
   },
   "source": [
    "Specify train/valid/test data path and path to save the final bot model:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "vTUdrrQVTgXi"
   },
   "outputs": [],
   "source": [
    "gobot_config['metadata']['variables']['DATA_PATH'] = 'my_data'\n",
    "gobot_config['metadata']['variables']['MODEL_PATH'] = 'my_bot'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "2WDWC18cTgXm"
   },
   "source": [
    "The whole dialogue system pipeline looks like this:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "nQmue0vIGdA0"
   },
   "source": [
    "    \n",
    "![gobot_pipeline.png](https://github.com/deepmipt/DeepPavlov/blob/master/examples/img/gobot_pipeline.png?raw=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "X0cFEvYTTgXo"
   },
   "source": [
    "### Training policy network"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 1000
    },
    "colab_type": "code",
    "id": "X9wXKXuHTgXr",
    "outputId": "3ae99954-a888-40fb-8c21-74e3a5876a5d"
   },
   "outputs": [],
   "source": [
    "from deeppavlov import train_model\n",
    "\n",
    "gobot_config['train']['batch_size'] = 8 # batch size\n",
    "gobot_config['train']['max_batches'] = 250 # maximum number of training batches\n",
    "gobot_config['train']['log_on_k_batches'] = 20\n",
    "gobot_config['train']['val_every_n_batches'] = 40 # evaluate on full 'valid' split each n batches\n",
    "gobot_config['train']['log_every_n_batches'] = 40 # evaluate on 20 batches of 'train' split every n batches\n",
    "\n",
    "train_model(gobot_config);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "fGGRzuacTgX0"
   },
   "source": [
    "Training on 50 dialogues takes from 5 to 20 minutes depending on gpu/cpu hardware. Training on 1000 dialogues takes 10-30 mins.\n",
    "\n",
    "See DeepPavlov [config doc page](http://docs.deeppavlov.ai/en/master/intro/configuration.html) for advanced configuration of the training process."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "ldfDa9dUTgX1"
   },
   "source": [
    "### Evaluation of training"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "k-z7wZjOTgX6"
   },
   "source": [
    "Calculating **accuracy** of trained bot: whether predicted system responses match true responses (full string match)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 338
    },
    "colab_type": "code",
    "id": "EpPmQkTvTgX8",
    "outputId": "60071240-296f-4cec-d7e8-da2371e0fe90",
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "from deeppavlov import evaluate_model\n",
    "\n",
    "evaluate_model(gobot_config);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "1wZOqmYBTgYB"
   },
   "source": [
    "With settings of `max_batches=250`, valid accuracy `= 0.5` and test accuracy is `~ 0.5`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "ElGD1tnJTgYC"
   },
   "source": [
    "## 4. Interact with Bot"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 204
    },
    "colab_type": "code",
    "id": "m9sJXOPPTgYF",
    "outputId": "34c4627f-6be1-44d3-893e-5c65f7bd3689"
   },
   "outputs": [],
   "source": [
    "from deeppavlov import build_model\n",
    "\n",
    "bot = build_model(gobot_config)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 33
    },
    "colab_type": "code",
    "id": "DXSYe_S1TgYL",
    "outputId": "2afc564c-45df-45b5-a5f3-68d9fae1b287"
   },
   "outputs": [],
   "source": [
    "bot(['hi, i want to eat, can you suggest a place to go?'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 70
    },
    "colab_type": "code",
    "id": "zfYYMHFATgYO",
    "outputId": "dbf7c44b-d4b9-4ac0-808e-d84e427dc7c1"
   },
   "outputs": [],
   "source": [
    "bot(['i want cheap food'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 70
    },
    "colab_type": "code",
    "id": "7lqjO0qXTgYQ",
    "outputId": "1a2fd9ef-d023-45f4-cea7-0b88c704c9a6"
   },
   "outputs": [],
   "source": [
    "bot(['chinese food'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 33
    },
    "colab_type": "code",
    "id": "sdSgK3JFTgYV",
    "outputId": "2deddb0b-1bc1-432e-dcf6-22c39fa21832"
   },
   "outputs": [],
   "source": [
    "bot(['thanks, give me their address'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 33
    },
    "colab_type": "code",
    "id": "bWWfPY7FTgYb",
    "outputId": "17fc9dae-f768-4875-e81d-e72bf80c7372"
   },
   "outputs": [],
   "source": [
    "bot(['i want their phone number too'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 33
    },
    "colab_type": "code",
    "id": "0nSDyEXwTgYe",
    "outputId": "46260e7d-dd83-4415-b95a-95246d6176bb"
   },
   "outputs": [],
   "source": [
    "bot(['bye'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "MlwvZCPJTgYh"
   },
   "outputs": [],
   "source": [
    "bot.reset()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 53
    },
    "colab_type": "code",
    "id": "tvTZI4PKTgYl",
    "outputId": "e75d3136-4581-451d-d121-829c7dd28256"
   },
   "outputs": [],
   "source": [
    "bot(['hi, is there any cheap restaurant?'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can also train a Simple bot following [gobot_tutorial.ipynb](https://github.com/deepmipt/DeepPavlov/blob/master/examples/gobot_tutorial.ipynb)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "7An-mcu8TgYq"
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "colab": {
   "collapsed_sections": [
    "l5mjRphbTgVb",
    "n597CLhqjqcd",
    "XeJMI9qaTgVt",
    "mBoO34NzTgV4"
   ],
   "name": "gobot_tutorial(MB).ipynb",
   "provenance": [],
   "version": "0.3.2"
  },
  "kernelspec": {
   "display_name": "Python 3",
   "name": "python3"
  },
  "accelerator": "GPU",
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
